Goto

Collaborating Authors

 cocktail party problem


New Hearing Aid Company, Foretell, Brings in Steve Martin and Others as Fans

WIRED

Well, Who Do You Know? AI-powered startup Fortell has become a secret handshake for the privileged hearing-impaired crowd who swear by the product. Now, it wants to be in your ears. A secret is percolating at dinner parties, salons, and cocktail gatherings among the august New York City elite. It's whispered in the circles of financial masters of the universe, Hollywood stars, and owners of sports teams. Many haven't--or if they did hear, they might not have made out the words through noisy cross-conversations. Once they do know--particularly if they're boomers--they want it desperately. Fortell is a hearing aid, one that claims to use AI to provide a dramatically superior aural experience. The chosen few included in its beta test claim that it seems to top the performance of high-end devices they'd been unhappily using. These testers have made pilgrimages to Fortell's headquarters on the fifth floor of a WeWork facility in New York City's trendy SoHo neighborhood, where they were fitted for the hearing aids--which from the outside look pretty much like standard, over-the-ear, teardrop-shaped devices. But the big moment comes when a Fortell staffer takes them down to street level.


Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment

von Kügelgen, Julius

arXiv.org Machine Learning

Causal models provide rich descriptions of complex systems as sets of mechanisms by which each variable is influenced by its direct causes. They support reasoning about manipulating parts of the system and thus hold promise for addressing some of the open challenges of artificial intelligence (AI), such as planning, transferring knowledge in changing environments, or robustness to distribution shifts. However, a key obstacle to more widespread use of causal models in AI is the requirement that the relevant variables be specified a priori, which is typically not the case for the high-dimensional, unstructured data processed by modern AI systems. At the same time, machine learning (ML) has proven quite successful at automatically extracting useful and compact representations of such complex data. Causal representation learning (CRL) aims to combine the core strengths of ML and causality by learning representations in the form of latent variables endowed with causal model semantics. In this thesis, we study and present new results for different CRL settings. A central theme is the question of identifiability: Given infinite data, when are representations satisfying the same learning objective guaranteed to be equivalent? This is an important prerequisite for CRL, as it formally characterises if and when a learning task is, at least in principle, feasible. Since learning causal models, even without a representation learning component, is notoriously difficult, we require additional assumptions on the model class or rich data beyond the classical i.i.d. setting. By partially characterising identifiability for different settings, this thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations. Ideally, the developed insights can help inform data collection practices or inspire the design of new practical estimation methods.


The Cocktail Party Problem: Speech/Data Signal Separation Comparison between Backpropagation and SONN

Neural Information Processing Systems

This work introduces a new method called Self Organizing Neural Network (SONN) algorithm and compares its performance with Back Propagation in a signal separation application. The problem is to separate two signals; a modem data signal and a male speech signal, added and transmitted through a 4 khz channel. The signals are sam(cid:173) pled at 8 khz, and using supervised learning, an attempt is made to reconstruct them. The SONN is an algorithm that constructs its own network topology during training, which is shown to be much smaller than the BP network, faster to trained, and free from the trial-and(cid:173) error network design that characterize BP.


Why Speech Separation is Such a Difficult Problem to Solve

#artificialintelligence

You are talking on the phone, or recording an audio, or just speaking to voice assistants like Google Assistant, Cortana, or Alexa. But the person on the other side of the call cannot hear you because you are in a crowded place, the recorded audio has a lot of background noise, or the "Hey, Alexa" call wasn't picked up by your device because someone else started speaking. All of these problems related to separating voices, informally referred to as the "cocktail party problem", have been addressed using artificial intelligence and deep learning methods in recent years. But still, separating and inferring multiple simultaneous voices is a difficult problem to completely solve. To start, speech separation is extracting speech of the "wanted speaker" or "speaker of interest" from the overlapping mixture of speech from other speakers, also referred to as'noise'.


Using 'Cocktail Party Problem' to Talk with Animals

#artificialintelligence

Animals communicating with each other might seem simplistic at first glance. Compared to human communication, animals do not appear to be using any particular language but merely noises to communicate with each other. Several noises that animals make are less of a conversation in the present, and more of a call for predicting natural changes such as rain, water, or signals for food some distance away. When it comes to artificial intelligence, plenty of progress has been made in the development of AGI using machine learning and neural networks on animals and through the understanding of animal behaviour. However, understanding the language of animals and communicating with them is one of the longest-running fields of study in technology and biological sciences alike.


Supervised vs Unsupervised & Discriminative vs Generative

#artificialintelligence

Highlights: GANs and classical Deep Learning methods (classification, object detection) are similar, but they are also fundamentally different in nature. Reviewing their properties will be the topic of this post. Therefore, before we proceed further with the GANs series, it will be useful to refresh and recap what is supervised and unsupervised learning. In addition, we will explain the difference between discriminative and generative models. Finally, we will introduce latent variables, since they are an important concept in GANs.


The cocktail party problem: Why voice tech isn't truly useful yet – TechCrunch

#artificialintelligence

On average, men and women speak roughly 15,000 words per day. We call our friends and family, log into Zoom for meetings with our colleagues, discuss our days with our loved ones, or if you're like me, you argue with the ref about a bad call they made in the playoffs. Hospitality, travel, IoT and the auto industry are all on the cusp of leveling-up voice assistant adoption and the monetization of voice. The global voice and speech recognition market is expected to grow at a CAGR of 17.2% from 2019 to reach $26.8 billion by 2025, according to Meticulous Research. Companies like Amazon and Apple will accelerate this growth as they leverage ambient computing capabilities, which will continue to push voice interfaces forward as a primary interface.


A Beginner's Guide To Attention And Memory In Deep Learning

#artificialintelligence

It might have never occurred to you how you could make sense of what your friend is blabbering at a loud party. There are all kinds of noises in a party; then how come we are perfectly able to carry out a conversation? This question is known widely as the'cocktail party problem'. Most of our cognitive processes can pay attention to only a single activity at a time. In the case of a party house, our capability of directing attention towards one set of words while ignoring other sets of words, which are often overpowering, is still a conundrum.


Amazon releases Alexa data set to help solve the 'cocktail party problem'

#artificialintelligence

The cocktail party problem, alternatively known as the dinner party problem, is the difficulty automated systems encounter when tasked with isolating audio in noisy, multisource environments. It's widely studied, and a number of academic teams, startups, and corporate giants claim to have solved it with sophisticated machine learning algorithms. But Amazon believes there's room for improvement, and to this end, it's releasing a data set -- the Dinner Party Corpus, or DiPCo -- intended to spur research on the topic. According to Zaid Ahmed, a senior technical program manager in the Alexa Speech group, the corpus was created with the help of Amazon volunteers who simulated a dinner-party scenario in the lab. Over the course of multiple sessions (each involving four participants), the volunteers served themselves food from a buffet table and spoke over music piped into the room.


Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment

Xu, Jiaming (Chinese Academy of Sciences, Institute of Automation) | Shi, Jing (Chinese Academy of Sciences, Institute of Automation) | Liu, Guangcan (Chinese Academy of Sciences, Institute of Automation) | Chen, Xiuyi (Chinese Academy of Sciences, Institute of Automation) | Xu, Bo (Chinese Academy of Sciences, Institute of Automation)

AAAI Conferences

Developing a computational auditory model to solve the cocktail party problem has long bedeviled scientists, especially for a single microphone recording. Although recent deep learning based frameworks have made significant progress in multi-talker mixed speech separation, most existing deep learning based methods, focusing on separating all the speech channels rather than selectively attending the target speech and ignoring other sounds, may fail to offer a satisfactory solution in a complex auditory scene where the number of input sounds is usually uncertain and even dynamic. In this work, we employ ideas from auditory selective attention of behavioral and cognitive neurosciences and from recent advances of memory-augmented neural networks. Specifically, a unified Auditory Selection framework with Attention and Memory (dubbed ASAM) is proposed. Our ASAM first accumulates the prior knowledge (that is the acoustic feature to one specific speaker) into a life-long memory during the training phase, meanwhile a speech perceptor is trained to extract the temporal acoustic feature and update the memory online when a salient speech is given. Then, the learned memory is utilized to interact with the mixture input to attend and filter the target frequency out from the mixture stream. Finally, the network is trained to minimize the reconstruction error of the attended speech. We evaluate the proposed approach on WSJ0 and THCHS-30 datasets and the experimental results demonstrate that our approach successfully conducts two auditory selection tasks: the top-down task-specific attention (e.g. to follow a conversation with friend) and the bottom-up stimulus-driven attention (e.g. be attracted by a salient speech). Compared with deep clustering based methods, our method conducts competitive advantages especially in a real noise environment (e.g. street junction). Our code is available at https://github.com/jacoxu/ASAM.